Index Compression Using Fixed Binary Codewords
نویسندگان
چکیده
Document retrieval and web search engines index large quantities of text. The static costs associated with storing the index can be traded against dynamic costs associated with using it during query evaluation. Typically, index representations that are effective and obtain good compression tend not to be efficient, in that they require more operations during query processing. In this paper we describe a scheme for compressing lists of integers as sequences of fixed binary codewords that has the twin benefits of being both effective and efficient. Experimental results are given on several large text collections to validate these claims.
منابع مشابه
A Post-Processing Mechanism for Sequential Use of Static and Dynamic Enumerative Code
A bijection between a complete set of source words and a complete set of codewords defines a variable-to-variable length (VV) source code. Such code is used to parse sequentially a source sequence into codewords. In a naive parsing of a finite source sequence, the last incomplete source word requires a separate post-processing. However, if the sizes of the source and the code alphabet are the s...
متن کاملAn adaptive incremental LBG for vector quantization
This study presents a new vector quantization method that generates codewords incrementally. New codewords are inserted in regions of the input vector space where the distortion error is highest until the desired number of codewords (or a distortion error threshold) is achieved. Adoption of the adaptive distance function greatly increases the proposed method's performance. During the incrementa...
متن کاملSpatial Image Watermarking by Error-Correction Coding in Gray Codes
In this paper, error-correction coding (ECC) in Gray codes is considered and its performance in the protecting of spatial image watermarks against lossy data compression is demonstrated. For this purpose, the differences between bit patterns of two Gray codewords are analyzed in detail. On the basis of the properties, a method for encoding watermark bits in the Gray codewords that represent sig...
متن کاملOn a Class of Constant Weight Codes
For any odd prime power q we first construct a certain non-linear binary code C(q, 2) having (q − q)/2 codewords of length q and weight (q − 1)/2 each, for which the Hamming distance between any two distinct codewords is in the range [q/2 − 3√q/2, q/2 + 3√q/2] that is, ‘almost constant’. Moreover, we prove that C(q, 2) is distance-invariant. Several variations and improvements on this theme are...
متن کاملRobust Image and Video Coding with Pyramid Vector Quantisation
Most current image and video coding standards use variable length codes to achieve compression, which renders the compressed bitstream very sensitive to channel errors. In this paper, image and video coders based on Pyramid Vector Quantisation (PVQ) and using only fixed length codes are proposed. Still image coders using PVQ in conjunction with DCT and wavelet techniques are described and their...
متن کامل